Search CORE

25 research outputs found

Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments

Author: A Fraser
A Prakash
AI Nesvizhskii
BM Mayr
C Croux
CS Brown
DA Stead
E Machtejevas
Egidijus Machtejevas
F Model
GV Cohen Freue
Hartmut Schlüter
J Harezlak
J Listgarten
Joachim Thiemann
K Choo
K Flikka
K Pearson
KC Leptos
Klaus Unger
Knut Reinert
KR Coombes
M Bern
M Mann
M Sturm
M Xu
O Hössjer
O Kohlbacher
O Schulz-Trieglaff
O Schulz-Trieglaff
Ole Schulz-Trieglaff
P Mahalanobis
RE Moore
S Cappadona
S Na
T Whistler
W Windig
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important. Results We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis. Conclusion We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.</p

Crossref

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

OpenMS – An open-source software framework for mass spectrometry

Author: A Keller
A Savitzky
Alexandra Zerck
Andreas Bertsch
Andreas Hildebrandt
BM Mayr
C Gröpl
CA Smith
CC Chang
Clemens Gröpl
D Ballard
DM Horn
DN Perkins
E Lange
E Lange
EA Kapp
Eva Lange
G Stockman
J Hartler
JB Breen
K Reinert
KC Leptos
Knut Reinert
LNN Mueller
LY Geer
M Bellew
M Katajamaa
Marc Sturm
ME Monroe
N Pfeifer
Nico Pfeifer
O Kohlbacher
O Schulz-Trieglaff
Ole Schulz-Trieglaff
Oliver Kohlbacher
P Soille
PGA Pedrioli
R Hussong
Rene Hussong
RG Sadygov
S Orchard
S Tanner
VB Di Marco
W Press
XJ Li
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. Results We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies. Conclusion OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at <url>http://www.openms.de</url>.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Gutenberg Open

Prospects for a Statistical Theory of LC/TOFMS Data

Author: A David
A Ipsen
A Ipsen
A Lommen
CA Smith
D Barbacci
E Lange
EF Strittmatter
G Theodoridis
I Chernushevich
J Cox
J Li
J Li
KR Coombes
L Brodsky
M Anderle
M Guilhaus
M Katajamaa
M Katajamaa
O Kohlbacher
O Schulz-Trieglaff
P Coates
P Coates
P Du
R Feng
R Zubarev
RB Opsal
T Kind
Y He
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

The critical importance of employing sound statistical arguments when seeking to draw inferences from inexact measurements is well-established throughout the sciences. Yet fundamental statistical methods such as hypothesis testing can currently be applied to only a small subset of the data analytical problems encountered in LC/MS experiments. The means of inference that are more generally employed are based on a variety of heuristic techniques and a largely qualitative understanding of their behavior. In this article, we attempt to move towards a more formalized approach to the analysis of LC/TOFMS data by establishing some of the core concepts required for a detailed mathematical description of the data. Using arguments that are based on the fundamental workings of the instrument, we derive and validate a probability distribution that approximates that of the empirically obtained data and on the basis of which formal statistical tests can be constructed. Unlike many existing statistical models for MS data, the one presented here aims for rigor rather than generality. Consequently, the model is closely tailored to a particular type of TOF mass spectrometer although the general approach carries over to other instrument designs. Looking ahead, we argue that further improvements in our ability to characterize the data mathematically could enable us to address a wide range of data analytical problems in a statistically rigorous manner

Crossref

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

Improved quality control processing of peptide-centric LC-MS proteomics data

Author: Amy C. Sims
Anderson
Barnett
Bobbie-Jo M. Webb-Robertson
Bukhman
Caroni
Cho
Croux
Daly
Dixon
Filzmoser
Grubbs
Hawkins
Hoaglin
Jain
Jaitly
Joel G. Pounds
Jon M. Jacobs
Karpievitch
Katrina M. Waters
Kauffmann
Kemmeren
Lee
Li
MacCoss
Mahalanobis
Melissa M. Matzke
Metz
Monroe
Oberg
Oberg
Piening
Ralph S. Baric
Rocke
Rocke
Rudnick
Schulz-Trieglaff
Smith
Stead
Thomas O. Metz
Webb-Robertson
Wilson
Xia
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values

Crossref

PubMed Central

Carolina Digital Repository

LC-MSsim – a simulation software for liquid chromatography mass spectrometry data

Abstract Background Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. Results We present <it>LC-MSsim</it>, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, <it>LC-MSsim </it>writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. Conclusion <it>LC-MSsim </it>generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that <it>LC-MSsim </it>will be useful to the wider community to perform benchmark studies and comparisons between computational tools.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

An Ultra-Fast Metabolite Prediction Algorithm

Author: A Duran
A Lommen
A Norbeck
A Robinson
B Efron
B Fischer
B Voss
C Sedgewick
C Smith
CD Broeckling
D De Souza
E Lange
E von Roepenack-Lahaye
F Matthäus
J de Groot
J Wong
Jérémie Bourdon
K Johnson
K Saito
K Saito
KM Oksman-Caldentey
L Wu
M Chae
M Katajamaa
M Robinson
M Sturm
MR Garey
Murray Grant
N Hoffmann
O Fiehn
O Schulz-Trieglaff
P Baldi
PJ DiMaggio
Q Ma
R Baran
R Biedendieck
R Powers
R Tibshirani
S Skiena
S Westergaard
T Conrads
T Okada
V Mapelli
V Perera
V Tusher
Zheng Rong Yang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Small molecules are central to all biological processes and metabolomics becoming an increasingly important discovery tool. Robust, accurate and efficient experimental approaches are critical to supporting and validating predictions from post-genomic studies. To accurately predict metabolic changes and dynamics, experimental design requires multiple biological replicates and usually multiple treatments. Mass spectra from each run are processed and metabolite features are extracted. Because of machine resolution and variation in replicates, one metabolite may have different implementations (values) of retention time and mass in different spectra. A major impediment to effectively utilizing untargeted metabolomics data is ensuring accurate spectral alignment, enabling precise recognition of features (metabolites) across spectra. Existing alignment algorithms use either a global merge strategy or a local merge strategy. The former delivers an accurate alignment, but lacks efficiency. The latter is fast, but often inaccurate. Here we document a new algorithm employing a technique known as quicksort. The results on both simulated data and real data show that this algorithm provides a dramatic increase in alignment speed and also improves alignment accuracy

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Open Research Exeter

FigShare

Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist's point of view

Author: A Frank
A Michalski
AB Noyce
Andrew D Mathis
B Domon
B Fischer
BM Hemminger
C Bielow
CA Smith
CF Taylor
Dan Ventura
E Fahy
E Lange
EW Kraegen
H Liu
H Mischak
HC Köfeler
J Listgarten
J Samuelsson
JB German
JD Egertson
JE Elias
JK Eng
John T Prince
JW Wong
K Biemann
K Podwojski
K Schmelzer
KK Murray
L Feng
LN Mueller
LN Mueller
M Dakna
M Morris
M Sugimoto
MR Wenk
MY Brusniak
N Jeffries
O Fiehn
O Schulz-Trieglaff
PL Whetzel
R Smith
R Smith
R Smith
RB Cole
RJ Arnold
Rob Smith
S Cappadona
TM Annesley
VI Babushok
W Wang
WE Wolski
WJ Griffiths
X Han
XJ Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Computational pan-genomics: Status, promises and challenges

Author: Abeel T. (Thomas)
Alkan C. (Can)
Baaijens J.A. (Jasmijn)
Bakker P.I.W. (Paul) de
Boeva V. (Valentina)
Bonnal R.J.P. (Raoul)
Chiaromonte F. (Francesca)
Chikhi R. (Rayan)
Ciccarelli F.D. (Francesca)
Cijvat C.P. (Robin)
Datema E. (Erwin)
Dijkstra L.J. (Louis)
Duijn C.M. (Cornelia) van
Dutilh B.E. (Bas)
Eichler E.E. (Evan)
El-Kebir M. (Mohammed)
Ernst C. (Corinna)
Eskin E. (Eleazar)
Garrison E. (Erik)
Ghaffaari A. (Ali)
Guryev V. (Victor)
Kersey P. (Paul)
Klau G.W. (Gunnar)
Kloosterman W.P. (Wigard)
Korbel J.O. (Jan)
Lameijer E.-W. (Eric-Wubbo)
Langmead B. (Benjamin)
Marschall T. (Tobias)
Martin M. (Marcel)
Marz M. (Manja)
Medvedev P. (Paul)
Mu J.C. (John)
Mäkinen V. (Veli)
Neerincx P.B.T. (Pieter)
Novak A.M. (Adam)
Ouwens K. (Klaasjan)
Paten B. (Benedict)
Peterlongo P. (Pierre)
Pisanti N. (Nadia)
Porubsky D. (David)
Rahmann S. (Sven)
Raphael B.J. (Benjamin)
Reinert K. (Knut)
Ridder D. (Dick) de
Ridder J. (Jeroen) de
Rivals E. (Eric)
Sanders A.D. (Ashley)
Schlesner M. (Matthias)
Schulz-Trieglaff O. (Ole)
Schönhuth A. (Alexander)
Sheikhizadeh S. (Siavash)
Shneider C. (Carl)
Smit S. (Sandra)
The Computational Pan-Genomics Consortium
Valenzuela D. (Daniel)
Vandin F. (Fabio)
Wang J. (Jiayin)
Wessels L.F.A. (Lodewyk)
Ye K. (Kai)
Zhang Y. (Ying)
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different Computational methods and paradigms are needed.We will witness the rapid extension of Computational pan-genomics, a new sub-area of research in Computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a Computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations

CWI's Institutional Repository

Erasmus University Digital Repository